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ABSTRACT 

This paper addresses a solution to the problem of scene estimation of motion video data in 
the fuzzy set theoretic framework. Using fuzzy image feature extractors, a new algorithm 
is developed to compute the change of information in each of two successive frames to 
classify scenes. This classification process of raw input visual data can be used to establish 
structure for correlation. The algorithm attempts to fulfill the need for non-linear, frame- 
accurate access to video data for applications such as video editing and visual document 
archival/retrieval systems in multimedia environments. 


1. INTRODUCTION 

With rapid advancements in multimedia technology, it is increasingly common to have 
time-varied data like video as computer data types. Existing database systems do not have 
the capability to search within such information. It is a difficult problem to determine one 
scene from another because there are no precise markers that identify where they begin 
and end. And, divisions of scenes can be subjective especially if transitions are subtle. 
One way to estimate scene transitions is to mathematically approximate the change of 
information between each of two successive frames by computing the distance between 
their discriminatory properties. A fuzzy theoretic approach in image processing and 
pattern recognition provides convenient methods for such ambiguous or uncertainty 
measure. 


1.1 Fuzzy Image Concepts 

In classical image processing, given a digital image, which has a M by N dimension with L 
gray levels, each picture element or pixel is represented as a spatial brightness function or 
gray information. Using fuzzy notion, an image can be considered as an array of fuzzy 
singletons, each having a value of membership denoting its degree of brightness relative to 

some brightness level, /, where / = 0, 1,2, L-l . The fuzzy notation can be written as 

follows: 
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or 


X — { M-mn (Xn,,, ) M"inn / ^nn ; 2, •••» ® — 2, ..., N) 

X =(J(J I x ™> m = 1. 2, ..., M; n = 1, 2, .... N 


where n x ( x mn) or M-mn / x mn» (0 < M-mn ^ 1) denotes the grade of possessing some 
property (e.g., brightness, edginess, smoothness) by the (m,n)th pixel intensity x^. 
In other words, a fuzzy subset of an image X is mapping p from X into [0,1] (Figure 1.1). 
For any point p e X , |i(p) is called the degree of membership of p in p [11], 



Figure 1.1: Fuzzy representation of an image X 


2. IMAGE PROPERTIES 

There are many spatial and geometric properties or features that can be measured or 
extracted from an image. They are used for pattern classifications and scene analysis. 
There is no trivial solution to selecting optimal features that would provide useful input 
values to the classifier. The effectiveness of these feature extractors also depends upon 
scenes. For this paper, six operators for ambiguity and fuzzy geometric measures are 
selected. 

2.1 Ambiguity Measures 

Two measures of ambiguities used are second-order local entropy and edginess. They 
produce a measure of structural information that exists in a given image. The entropy of 
an image can be defined as a measure of the information (ambiguities) gain in a given 
image. The edginess measures the coarseness of texture based on the average amount of 
ambiguity present in a given image. 

2.1.1 Second-order Local Entropy 

The calculation of the second-order local entropy contains a window that operates on two 
adjacent pixels. This window is then used to compute the co-occurrence matrix for 
incorporating the dependency of the spatial distribution of gray levels. In this case, the 
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horizontal co-occurrence matrix is used. Then, the probability of the co-occurrence 
matrix is calculated with 



, where 0 < pjj < 1 [12]. 


Pi] log( Py) 


The information gain is computed with a logarithmic function. As described in [8], this 
could be an exponential function. The co-occurrence matrix computation could also be 
modified with a combination of horizontal and vertical directions for a more accurate 
measure of the spatial distributions. 


2.1.2 Edginess 

This image property is a measure of edge information to detect edge intensities in an input 
image. Note that this is different from the gradient descent edge detectors. It calculates 
the edge ambiguity using a localized window to find the boundary between the current 
pixel and neighboring pixels [12]. 

In the equation 

8(X) = [1 - I(X)] f , 

I(X) stands for the ambiguity measure, or the index of fuzziness, and P is a positive 
constant The spatial dependent membership function, must be computed first. 

0.5 

MO= 1 

xyi 

1 U 

where Nj represents the dimensions of the window of i by j, i.e. Nj = i*j. These are 
neighboring pixels of the point (m, n). As shown in Figure 2.1, the linear index of 
fuzziness, I(X), can be defined as follows: 


I(X) = ~ Z min(p x (xi), 1 - M-x(xi))- 
i 
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Figure 2.1: The linear index of fuzziness 
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Other measures of fuzziness, such as the quadratic index of fuzziness [6], fuzzy entropy 
[2], and index of non-fuzziness (crispness) [12], could also be used for the edginess 
measure. 

22 Fuzzy Geometric Measures 

Geometric measures define surfaces, shapes, solids, and boundaries of objects. Rosenfeld 
[13] and Rosenfeld and Haber [15] incorporated the fuzzy theoretic approach to the 
classical geometric measures and generalized some of the standard geometric properties of 
the relationships among regions to fuzzy sets [10]. Of these many measures, the primitive 
measures, such as area and perimeter, orientation measures, and shape measures are 
applied here. 

The remaining methods that were applied, namely fuzzy geometrical properties, were 
extensions of the traditional geometrical measure concepts to operate in the fuzzy set 
framework. These measures examine various geometrical properties and relations such as 
area, perimeter, length, height, breadth, width, compactness, and elongatedness. There 
are many other topological concepts such as connectedness, major and minor axis, and 
adjacency, which could have been utilized in this study. These fuzzy measures are the 
basis for measuring spatial, gray, and region ambiguities. 

22 A Area 

The area is an integral taken over the fuzzy image subset, i.e. J |i(x). For a digital image, 

it is computed by summing the spatial brightness values of all image pixels. This spatial 
brightness value function is treated as the fuzzy membership function [11, 14], 

area(|i(x)) =Zp(x) 

2.2 2 Perimeter 

The perimeter of an image is defined as the circumferential distance around the boundary. 
Using a faster method of computation, it can be computed as the sum of the product of 
the co-occurrence matrix and the difference of two adjacent pixels [1 1], 

perimeter(|i(X)) = ^ c[ij] |[t(i)-[i(j)l 

where i=l, 2, ..... L and j= 1,2, ..... L. 


223 Length 

The length of an image is calculated by finding the longest extent in the column direction 
[11,14]. 
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length (^) = njftXi 


X M-mn 
n 


\ 


2.2.4 Height 

The height of an image is another way of measuring its extent by summing the maximum 
membership values of each row [1 1, 14]. 

height^) = Z^ Hmn 
n 


2.2.5 Breadth 


The breadth of an image measures the longest extent in the row direction [11, 14]. 

./v \ 


breadth (p) = 


X M-mn 
m 


2.2.6 Width 


The width is calculated as the sum of maximum membership values of each column [11, 
14]. 

width (p) = ZV l 1 run- 
in 


2.3 Orientations 
The horizontal and 

If 


vertical orientation of an image can be measured as follows [11]: 


length(p) 

height(p) 


< 1, then vertically oriented. 


If 


breadth(p) 
width(p) ~ 


then horizontally oriented. 


2.4 Shape Measures 

Shape measures can be computed using geometrical properties of a given image. These 
measures can also be defined independently of size measurements [16]. It basically 
represents the profile and physical structure of an image or image subsets. Two fuzzy 
measures are used: compactness and index of area coverage. 
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2.4.1 Compactness 

The compactness measures the property of circularity [1 1]. 


Comp(fx) = 


area(u) 

(perimeter(p))2 


2.4.2 Index of Area Coverage 

The index of area coverage (IOAC) is the fraction of the maximum area (that can be 
covered by the length and breadth of the image) actually covered by the image [1 1]. 


IOAC(p) = 


area(p) 

length(p) * breadth (|i) 


3. SCENE ESTIMATION 

As discussed in [12], the criterion of a good feature is that it should be invariant within 
class variation while emphasizing differences that are important in discriminating between 
patterns of different types. It is difficult to determine an optimal feature space comprising 
a set of image properties which would produce significant factors influential to 
classification decision. The approach taken for determining important features is to select 
image properties, namely ambiguity, size, orientation, and shape measures. Then, it 
translates all images to this pre-determined feature space. 


(2 



Figure 3.1: Fuzzy Image Feature Vector 
Figure 3. 1 depicts the sampled feature space having three features 
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and how the distance, Idjl, between two successive frames can be calculated with vector 

operation, I i \ - i 2 I. Because the goal is to analyze motion, this calculation of change of 
image constituents from frame to frame in a given time series gives the sampled mean and 
the sampled variance of all image features. By giving smaller weights to features having 
larger variance, the important features with small variance have more influence in the 
decision making process. It is discussed as a useful clustering technique to maximize the 
inter-set distance or minimize intra-set distance using a diagonal transformation such that 
features having larger variance are less reliable [ 12 ]. 

3.1 Distance Computation 

Before the applied mathematical terms are discussed, the following nomenclatures need to 
be described. 


M 

Total number of frames or images 

m 

Last frame number where m = M-l 

N 

Total number of features or properties 

i 

Index to represent current image at t where i = 0, l,....m 

k 

Index to represent the next image at t+1 where k = 1,2,. ..m-l 


The sampled mean for the j ^ 1 feature element is given by, 

1 m 

fj = ^ X fij where j = 1 , 2 ,. ...,n. 
i =0 

Mnemonically, the index of feature element j, where j = 1, 2,....,n, can be represented in 
the following enumerated terms: edginess, entropy, compactness, ioac, 1 /h, and b/w, 

respectively (e.g. fo^y). To standardize all sampled mean values to be 0.5, the following 
conversion is performed. This gives equal salience to all features for distance computation 
[3]. 


f 

^norm _ q 5 3 . 

U fj’ 

Consequently, this standardization makes all fj to be set to 0.5. And, the sampled 
variance for the j 1 * 1 feature element is computed as 

= T^-jT X «■ * *j ) 2 where j = 1 , 2 , ....,n. 

(m-i) („o 

The magnitude of the normalized distance between two successive frames i and k is [18], 
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n norm 

U ik 



4. EXPERIMENTS 

Based on the above formulas, a schematic diagram (Figure 4.1) can be drawn to describe 
the process of feature selection and frame selection. 



Figure 4.1: Schematic diagram of feature selection process 

The distance between two frames in the aforementioned three feature space is computed 
to check the similarities. If this distance is larger than a predetermined threshold value, 
then the current video frame is considered to be significantly different from the previous 
frame, and therefore needs to be registered or stored as one of the abstract keys (Figure 
4.2). 



Figure 4.2: Schematic diagram of the frame selection process 

4.2 Input Data 

Movie film projectors display 24 frames per second whereas NTSC standard television 
and video devices display 30 frames per second to achieve continuous and fluid full- 
motion images. The change of inter-ffame information is gradual at such high frame rates. 
For storage conservation and computational efficiency, the simplest way to reduce or 
abstract video data is to sample it at lower frame rate. 
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In this paper, a time-suppressed frame rate of one per 5 seconds was assumed. A set of 
digitized video of previous space shuttle missions obtained from NASA/JSC was used 
(Figure 4.3). After a pre-processing step, each frame is stored in the CompuServe 
Graphic Interchange File (GIF) format for portability. 



Figure 4.3: Experimental input data 

With the fuzzy measures, the resulting distances between each two successive frames are 
shown in Figures 4.4 through 4.6. The abscissa represents the total number of frame 
distances in the sampled time series while the ordinate is the computed distance value 

between two successive images, i.e. lij - ij+il. For example, the abscissa index 0 

represents I i q - Fjl, 1 represents I i j and so on. Each scene consists of six frames, 
therefore, there is a change of scene at every sixth index on the abscissa. The scene 
separation is denoted with vertical grid lines. Three sets of detection were experimented 
as follows: 

(1) Entropy, Compactness, L/H (Figure 4.4) 

(2) Edginess, IOAC, B/W (Figure 4.5), and 

(3) All of the above (Figure 4.6). 
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Figure 4.6: Detect 3 - All six features 


78 











It is to note that combining all features does not necessarily produce better results just 
because there are more features. It is not the quantity that is critical, but the 
discriminatory quality of features. 


5. SUMMARY 

The technique discussed here needs further improvements. It must have a classifier to 
correctly cluster the frames to the appropriate scenes. Both statistical and fuzzy approach 
pattern classifiers are being explored. Video frames that are to be classified are of 
temporal and dynamic data types, so non-linear classification methods need to be 
implemented. Scene classification is quite subjective in nature; therefore, the interactive 
tool developed here can be further extended to provide human interaction in setting 
problem-dependent criteria for this machine recognition task. Furthermore, the scenes 
that are detected may not necessarily be different from one another, but rather compose a 
video segment or document. A hierarchical abstraction scheme that allows for a higher 
level of abstraction will better suit the visual data management environment 

Finally, in the merging worlds of computers and media, new technologies mix traditional 
media such as video and publications with computer media as interactive, informational 
and entertainment software. This trend is rapidly growing at an unprecedented rate. Once 
digital video becomes a repository of common data on computers, the data needs to be 
accessed and manipulated just as documents are retrieved and managed by a DBMS. It 
might be useful to investigate new video inter-referencing strategies in correlating various 
context from the same event to derive knowledge points. Thus, this automatic abstraction 
of video index keys for non-linear, frame-accurate access would make information 
acrchival and retrieval applications more robust and efficient 
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